An information theoretic view of gapped and other alignments.

نویسنده

  • J P Schmidt
چکیده

We use an information theoretical framework to estimate the probability of the score of gapped alignments. With appropriate scaling, the score of a global (and with some adjustments also the score of a local) alignment of two sequences can be viewed as the difference in the number of bits needed to transmit the two sequences T1 and T2 under two different encoding schemes C1 and C2. C1 is an idealized scheme, assumed to achieve an optimal encoding with respect to a distribution p, and the assumption that T1 and T2 are independent. C2 is an alternate scheme, that will transmit T1 and T2 while taking advantage of the optimal alignment between the two. That is under C1, the strings T1 and T2 (with respective probabilities p(T1) and p(T2)), are assumed to be encoded using C1(T1, T2) = log [formula: see text] bits. By slightly modifying a known Theorem we show that the probability (under p) that two independent sequences T1, T2 can be transmitted with an alternate encoding scheme (C2) with no more than C1(T1, T2)-r bits is bounded by 2-r. We then show how to use this bound to derive upper bounds for the probability of gapped alignment scores between two sequences.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Effect of Grammar vs. Vocabulary Pre-teaching on EFL Learners’ Reading Comprehension: A Schema-Theoretic View of Reading

This study was designed to investigate the effect of grammar and vocabulary pre-teaching, as two types of pre-reading activities, on the Iranian EFL learners’ reading comprehension from a schema–theoretic perspective. The sample consisted of 90 female students studying at pre-university centers of Isfahan.  The subjects were randomly divided into three equal-in-number groups. They participated ...

متن کامل

A direct method for computing extreme value (Gumbel) parameters for gapped biological sequence alignments

We develop a general method for computing extreme value distribution (Gumbel, 1958) parameters for gapped alignments. Our approach uses mixture distribution theory to obtain associated BLOSUM matrices for gapped alignments, which in turn are used for determining significance of gapped alignment scores for pairs of biological sequences. We compare our results with parameters already obtained in ...

متن کامل

A New Middle Path Approach For Alignements In Blast

This paper deals with a new middle path approach developed for reducing alignment calculations in BLAST algorithm. This is a new step which is introduced in BLAST algorithm in between the ungapped and gapped alignments. This step of middle path approach between the ungapped and gapped alignments reduces the number of sequences going for gapped alignment. This results in the improvement in speed...

متن کامل

An Information-Theoretic Discussion of Convolutional Bottleneck Features for Robust Speech Recognition

Convolutional Neural Networks (CNNs) have been shown their performance in speech recognition systems for extracting features, and also acoustic modeling. In addition, CNNs have been used for robust speech recognition and competitive results have been reported. Convolutive Bottleneck Network (CBN) is a kind of CNNs which has a bottleneck layer among its fully connected layers. The bottleneck fea...

متن کامل

Mclip: motif detection based on cliques of gapped local profile-to-profile alignments

UNLABELLED A multitude of motif-finding tools have been published, which can generally be assigned to one of three classes: expectation-maximization, Gibbs-sampling or enumeration. Irrespective of this grouping, most motif detection tools only take into account similarities across ungapped sequence regions, possibly causing short motifs located peripherally and in varying distance to a 'core' m...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing

دوره   شماره 

صفحات  -

تاریخ انتشار 1998